Solar Energy Production in Calgary

Data 601 Project

By: Sasi Challa, Zhenyu (Mike) Li, and Nathan Tell

Introduction

The exploration that our group is undertaking is all about solar energy production in Calgary. We wanted to explore how solar energy production works, which for our purposes includes what is it used for and also when can it be used. We are trying to answer these questions by taking a look at our main dataset, taken from the City of Calgary's open data website and using that dataset to explore the other problems that we are looking at. Our first step as shown below in the Jupyter notebook will be us uploading the three data sets that we need to complete our exploration, including the one on solar energy production, one on electricity consumption and finally one weather. These datsets will then be used to create visualizations on how solar energy is produced and used across Calgary. The first analysis and visualization will be exclusively on the solar energy production dataset and will look at the total energy produced per year, to see how solar energy production has changed over time. The second analysis will be on how solar energy relates to electricity consumption. Since a number of locations across the city of Calgary use solar energy, we can look at how much energy solar power can provide for these locations and which uses the city has deemed solar energy to be most viable for. The final analysis will be on how solar energy is related to weather, which will include time of year and time in the day to see when solar energy will be most prevalent. From these analysis we hope to extract an explanation of when, and how soalr energy can be used more effectively for Calgarians.

In [1]:
# to start we will also import all the packages necessary
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from time import strptime
import ipywidgets as widgets
from ipywidgets import interact
import plotly.graph_objs as go
import plotly.express as px

In the Cell below, we are importing the data for the solar energy production and cleaning it up so it can be used with the other datasets.

In [2]:
#first we import and read the csv
solar_energy_prod= pd.read_csv("./Solar_Energy_Production.csv", header = 0, names = ["name", "id", "address", "date","kWh"])
#then we convert the date cells to date time
solar_energy_prod['date'] = pd.to_datetime(solar_energy_prod['date'])
# then we set all the column names and add their values into those columns
solar_energy_prod = pd.DataFrame({'Name' : solar_energy_prod['name'], 'ID' : solar_energy_prod['id'], 
                                  'Address' : solar_energy_prod['address'], 
                                  'date' : pd.to_datetime(solar_energy_prod['date'], format = '%Y/%m/%d %I:%M:%S %p'),
                                  'Year' : solar_energy_prod['date'].dt.year, 
                                  'Month' : solar_energy_prod['date'].dt.month, 
                                  'Day': solar_energy_prod['date'].dt.day, 
                                  'Time': solar_energy_prod['date'].dt.time, 
                                  'kWh': solar_energy_prod['kWh']})

# then we can aggregate solar energy production:
# throughout the City
overall_grouped_solar_energy = solar_energy_prod.groupby(['Year','Month','Day','Time','ID']).sum()
# over each year
overall_solar_per_year = overall_grouped_solar_energy.unstack().sum(axis=1,skipna=True).unstack().sum(axis=1,skipna=True).unstack().sum(axis=1,skipna=True).unstack().sum(axis=1,skipna=True)

# then the same idea over hours of the day
# For Individual hours of the day
time_grouped_solar_energy = solar_energy_prod.groupby(['Time','Year','Month','Day','ID']).sum()
# in a single hour
time_solar_per_time = time_grouped_solar_energy.unstack().sum(axis=1,skipna=True).unstack().mean(axis=1,skipna=True).unstack().mean(axis=1,skipna=True).unstack().mean(axis=1,skipna=True)


# To consolidate time into hourly format
time_solar_per_time_df = pd.DataFrame({'Time' : time_solar_per_time.index, 'kWh' : time_solar_per_time.values})
time_solar_per_time_df['Time'] = pd.to_datetime(time_solar_per_time_df['Time'],  format = '%H:%M:%S').dt.hour
#then we set proper labels for the hour of the day
time_bins = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]
time_labels = ["00:00:00","01:00:00","02:00:00","03:00:00","04:00:00","05:00:00","06:00:00","07:00:00","08:00:00",
             "09:00:00","10:00:00","11:00:00","12:00:00","13:00:00","14:00:00","15:00:00","16:00:00","17:00:00",
             "18:00:00","19:00:00","20:00:00","21:00:00","22:00:00","23:00:00"]
# and bin them
time_solar_per_time_df['Time'] = pd.cut(time_solar_per_time_df.Time, 
                                        bins = time_bins, labels = time_labels ,right = False)
time_solar_per_time_df= time_solar_per_time_df.groupby(['Time']).sum()
In [3]:
display(overall_solar_per_year)
Year
2015      36569.783
2016     166261.716
2017    1143066.051
2018    1714572.433
2019    1725008.983
2020    1680909.719
dtype: float64
In [4]:
display(time_solar_per_time_df)
kWh
Time
00:00:00 9.302970
01:00:00 1.878793
02:00:00 0.589083
03:00:00 1.398840
04:00:00 0.530317
05:00:00 1.192463
06:00:00 10.499407
07:00:00 43.373682
08:00:00 134.570772
09:00:00 288.830216
10:00:00 458.617309
11:00:00 594.152557
12:00:00 609.430589
13:00:00 557.949894
14:00:00 511.194306
15:00:00 400.013662
16:00:00 299.673122
17:00:00 217.508785
18:00:00 127.024847
19:00:00 83.322643
20:00:00 59.531622
21:00:00 36.888937
22:00:00 31.120542
23:00:00 19.713082

Next we are reading in the file for the weather statistics and cleaned it up.

In [5]:
# then we can read in the weather stats csv
weather = pd.read_csv("./Weather_Table.csv")
# then we need to clean up the data so we only have the columns that we want
# which is "max_temperature", "min_temperature", "precipitation", "daylight", "avg_wind_speed"
# daylight is in hours, avg. wind speed is in km/hr
weather = weather[["date", 'max_temperature', "min_temperature", "precipitation", "daylight", "avg_wind_speed"]]
# from this we can also switch the dates to a datetime
weather['date'] = pd.to_datetime(weather['date'])
# then we set each of the column names and place their values in the columns
weather = pd.DataFrame({"Year": weather['date'].dt.year, 
                        "Month": weather['date'].dt.month,
                        "Day": weather['date'].dt.day,
                        "Date": weather['date'],
                        "Max Temp": weather["max_temperature"],
                        "Min Temp": weather["min_temperature"],
                        "Precipitation": weather["precipitation"],
                        "Daylight (in Hrs)": weather["daylight"],
                        "Wind Speed (avg in Km/Hr)": weather["avg_wind_speed"]})
In [6]:
# then we view the table to make sure that the data imported properly    
display(weather.head())
Year Month Day Date Max Temp Min Temp Precipitation Daylight (in Hrs) Wind Speed (avg in Km/Hr)
0 2020 10 16 2020-10-16 3.9 -3.8 1.5 10.67 15.5
1 2020 10 15 2020-10-15 5.0 -3.8 0.0 10.73 15.0
2 2020 10 14 2020-10-14 5.4 -1.9 5.9 10.80 17.0
3 2020 10 13 2020-10-13 4.3 -2.8 0.0 10.85 13.0
4 2020 10 12 2020-10-12 5.1 -0.3 2.1 10.92 10.5

Now we merge the production data table with the weather data table

In [7]:
# then from this we can merge the tables based on the dates
# becuase we only need the daily total energy production we will use solar energy production summed up per day
dategroups = solar_energy_prod.groupby(["Year","Month","Day"]).sum()
dategroups = pd.DataFrame(dategroups)

# now that both of the tables have matching keys on date we cna merge the tables.
weathersolar = pd.merge(weather, dategroups, how = 'outer', on = ["Year", "Month", "Day"])
#then because the data is only from 2017 onwards we will take only cells that have a year = 2017 or greater
weathersolar = weathersolar.loc[weathersolar.Year >= 2017, :]

# then we can set the indices for grouping and plotting
weathersolar = weathersolar.set_index(["Year", "Month", "Day"])
In [8]:
# finally we display the table to make sure the data is properly formatted
display(weathersolar)
Date Max Temp Min Temp Precipitation Daylight (in Hrs) Wind Speed (avg in Km/Hr) ID kWh
Year Month Day
2020 10 16 2020-10-16 3.9 -3.8 1.5 10.67 15.5 38251030.0 2122.694
15 2020-10-15 5.0 -3.8 0.0 10.73 15.0 42076133.0 4812.953
14 2020-10-14 5.4 -1.9 5.9 10.80 17.0 38828680.0 2849.602
13 2020-10-13 4.3 -2.8 0.0 10.85 13.0 42076133.0 1746.247
12 2020-10-12 5.1 -0.3 2.1 10.92 10.5 42076133.0 1742.081
... ... ... ... ... ... ... ... ... ... ...
2017 1 3 2017-01-03 -15.5 -22.4 0.3 8.05 6.5 2649166.0 2.564
2 2017-01-02 -16.1 -26.1 0.0 8.02 4.5 2152825.0 1.628
1 2017-01-01 -7.8 -17.4 2.0 8.00 18.0 995703.0 0.008
2020 10 17 NaT NaN NaN NaN NaN NaN 35753668.0 596.690
18 NaT NaN NaN NaN NaN NaN 34938241.0 309.915

1387 rows × 8 columns

The reason that we will only be using data from 2017 on for the weather table is because a significant proportion of the solar energy production installations were either not producing a lot of energy or were not actually installed before 2017. Once we reach 2017 we can see that there is a significant amount of solar energy being produced and utilized by the locations described above in question 3, meaning that the overall solar energy production will be a better view at how the weather statistics will affect solar energy production in Calgary by looking at a number of different locations that are producing solar energy as opposed to only a few locations that may give biased results since weather can fluctuate across the city.

Then we can import the final table which is for electricity consumption, which we will also clean up and then use to plot the data.

In [9]:
consumption = pd.read_csv('Corporate_Energy_Consumption.csv')

# Keep Solar Power
consumption = consumption[consumption['Energy Description'].isin(['Solar Power','Electricity'])]

# Format the monthly data of consumption
consumption['Month'] = consumption['Month'].apply(lambda x: strptime(x, '%b').tm_mon)

# Keep data from '2015-9' - '2020-10'
consumption = consumption[~(consumption['Year']==2014) | ((consumption['Year']==2015) & (consumption['Month']<9))]

# Aggreagate data by month
consumption = consumption.groupby(['Business Unit Desc','Year','Month'],as_index=False )['Total Consumption'].sum()

# consumption year month
consumption['YearMonth'] = consumption['Year'].astype(str)+'-'+ consumption['Month'].astype(str)

# Unify the time range for each address
# 'YearMonth' is used to creating the unique date range (2015-9 to 2020-10)
# 'date_range' is used to create the unified date range for each address
addresses = pd.DataFrame(consumption['Business Unit Desc'].unique(), columns = ['Business Unit Desc'])
date_range = pd.DataFrame(consumption['YearMonth'].unique(),columns = ['date_range'])
addresses['key'] = 1
date_range['key'] = 1
consumption_monthly_same_range = pd.merge(addresses,date_range,on = 'key',how = 'outer').drop('key',1)
# then merge the tables into the final table that we will use to plot
consumption_monthly_ready = pd.merge(consumption, consumption_monthly_same_range, left_on = ['Business Unit Desc','YearMonth'], right_on = ['Business Unit Desc','date_range'], how = 'right')
consumption_monthly_ready['Total Consumption'] = consumption_monthly_ready['Total Consumption']
consumption_monthly_ready['Total Consumption'] = consumption_monthly_ready['Total Consumption'].fillna(0)
C:\Users\BEAST\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3071: DtypeWarning: Columns (2) have mixed types.Specify dtype option on import or set low_memory=False.
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
In [10]:
# finally we display the table to make sure that the data is formatted properly
consumption_monthly_ready
Out[10]:
Business Unit Desc Year Month Total Consumption YearMonth date_range
0 CPS - Bureaus 2015.0 1.0 1642139.0 2015-1 2015-1
1 CPS - Bureaus 2015.0 2.0 1500068.0 2015-2 2015-2
2 CPS - Bureaus 2015.0 3.0 1629950.0 2015-3 2015-3
3 CPS - Bureaus 2015.0 4.0 1579039.0 2015-4 2015-4
4 CPS - Bureaus 2015.0 5.0 1639249.0 2015-5 2015-5
... ... ... ... ... ... ...
1355 Transportation Infrastructure NaN NaN 0.0 NaN 2018-2
1356 Transportation Infrastructure NaN NaN 0.0 NaN 2018-3
1357 Transportation Infrastructure NaN NaN 0.0 NaN 2018-4
1358 Transportation Infrastructure NaN NaN 0.0 NaN 2018-5
1359 Transportation Infrastructure NaN NaN 0.0 NaN 2018-6

1360 rows × 6 columns

Question 1

How has solar energy production grown over time?

This analysis takes a look at the solar energy production and aggregates the values for each of the years in our data set (2016-2020). 2015 was ommitted as the data only starts in September. 2020 was not omitted, even though the year is not complete, as it is the current year and has 9.5 months of data, versus only 4 months in the case of 2015. 2020 although the year is still not complete, it still has values comparable to the previous years. This visualization will show us how solar energy has grown in Calgary and show why it is an interesting topic to study.

In [11]:
# plotting the data
overall_solar_per_year.loc[2016:2020].plot(kind='bar',color='r', fontsize = 15,figsize=(12,7))
plt.title("Increase in Solar Energy Production over time in the City of Calgary", fontsize = 20)
plt.xlabel("Year", fontsize = 15)
plt.ylabel("Solar Production Per Year (in kWh)", fontsize = 15)
plt.show()

Although this is a very simple graph it is very important to our analysis. From the graph above we can see that there was a signifcant jump in energy production from 2016 to 2017, and then again from 2017-2018, after that the graph evens out, likely due to the fact that there were not more solar energy panels installed since 2018.

Question 2

How has solar energy grown with consumption?

With this analysis we are trying to explore how the consumption has grown with production. Although it is great to know that Calgarians are producing solar energy, the more important part of the energy production is how this energy is thereafter used.

In [12]:
# Create Consumption Bar Chart
barchart = px.bar(
    data_frame = consumption_monthly_ready,
    x = 'Business Unit Desc',
    y = 'Total Consumption',
    color = 'Business Unit Desc',
    opacity = 0.9,
    title = 'Total Electricity Consumption by location',
    labels={'Total Consumption':'Total Consumption', 'Business Unit Desc':''},
    width = 1000,
    height = 600,
    animation_frame= 'date_range',
    range_y = [0,14000000]
)
barchart.update_layout(xaxis = dict( tickfont = dict(size=8)))
barchart.show()

The graph above shows over time the different locations total energy usage. As we can see from this graph the top consumers of energy in the City of Calgary are water services, Calgary Transit and roads.

The next visualization will be a breakdown of the production of solar energy by location so that we can see after how the production at each of the locations relates to the consumption at each of the locations.

In [13]:
production_total = solar_energy_prod.groupby(['Address'],as_index = False)['kWh'].sum().sort_values(by = 'kWh', ascending = False)
# display(production_total)
production_total_bar = px.bar(

    data_frame=production_total,
    x="Address",
    y="kWh",
    color="Address",              
    opacity=0.6,               
    orientation="v",             
    barmode='relative',
    title = 'Total Solar Energy Production by location till date'
)
production_total_bar.show()

As we can see in the graph above and similar to the consumption graph it seems that the water services plants are the heaviest producers of solar energy, where 11444 Bearspaw Dam RD NW is the address for the Bearspaw Water Treatment Facility, and the third highest is 1634 56 AV SW which is the Glenmore Water Treatment Plant.

In [14]:
# Next to ensure we have the correct data for the visualization,
# we will consolidate solar energy production into monthly data
solar_energy_prod = solar_energy_prod.set_index('date')
production_monthly = solar_energy_prod.groupby(['Name','ID','Address','Month','Year']).resample('M')['kWh'].sum().reset_index().sort_values('date')

# Creat 'YearMonth' column for Plotly animation
production_monthly['YearMonth'] = production_monthly['Year'].astype(str)+'-'+ production_monthly['Month'].astype(str)
display(production_monthly)
Name ID Address Month Year date kWh YearMonth
380 Southland Leisure Centre 164440 2000 SOUTHLAND DR SW 9 2015 2015-09-30 15475.746 2015-9
386 Southland Leisure Centre 164440 2000 SOUTHLAND DR SW 10 2015 2015-10-31 15068.149 2015-10
392 Southland Leisure Centre 164440 2000 SOUTHLAND DR SW 11 2015 2015-11-30 4302.924 2015-11
397 Southland Leisure Centre 164440 2000 SOUTHLAND DR SW 12 2015 2015-12-31 1722.964 2015-12
340 Southland Leisure Centre 164440 2000 SOUTHLAND DR SW 1 2016 2016-01-31 3081.863 2016-1
... ... ... ... ... ... ... ... ...
443 Whitehorn Multi-Service Centre 319086 3705 35 ST NE 10 2020 2020-10-31 16287.531 2020-10
391 Southland Leisure Centre 164440 2000 SOUTHLAND DR SW 10 2020 2020-10-31 6331.498 2020-10
204 Glenmore Water Treatment Plant 355827 1634 56 AV SW 10 2020 2020-10-31 11584.457 2020-10
248 Hillhurst Sunnyside Community Association 308057 1320 5 AV NW 10 2020 2020-10-31 1348.452 2020-10
66 CFD Firehall #7 551172 2708 4 ST NW 10 2020 2020-10-31 1312.006 2020-10

450 rows × 8 columns

In [15]:
# Unify the time range for each address
# 'YearMonth' is used to creating the unique date range (2015-9 to 2020-10)
# 'date_range' is used to create the unified date range for each address
addresses = pd.DataFrame(production_monthly['Address'].unique(), columns = ['Address'])
date_range = pd.DataFrame(production_monthly['YearMonth'].unique(),columns = ['date_range'])
addresses['key'] = 1
date_range['key'] = 1
production_monthly_same_range = pd.merge(addresses,date_range,on = 'key',how = 'outer').drop('key',1)

production_monthly_ready = pd.merge(production_monthly, production_monthly_same_range, left_on = ['Address','YearMonth'], right_on = ['Address','date_range'], how = 'right')
production_monthly_ready['kWh'] = production_monthly_ready['kWh']
production_monthly_ready
Out[15]:
Name ID Address Month Year date kWh YearMonth date_range
0 Southland Leisure Centre 164440.0 2000 SOUTHLAND DR SW 9.0 2015.0 2015-09-30 15475.746 2015-9 2015-9
1 Southland Leisure Centre 164440.0 2000 SOUTHLAND DR SW 10.0 2015.0 2015-10-31 15068.149 2015-10 2015-10
2 Southland Leisure Centre 164440.0 2000 SOUTHLAND DR SW 11.0 2015.0 2015-11-30 4302.924 2015-11 2015-11
3 Southland Leisure Centre 164440.0 2000 SOUTHLAND DR SW 12.0 2015.0 2015-12-31 1722.964 2015-12 2015-12
4 Southland Leisure Centre 164440.0 2000 SOUTHLAND DR SW 1.0 2016.0 2016-01-31 3081.863 2016-1 2016-1
... ... ... ... ... ... ... ... ... ...
677 NaN NaN 11444 Bearspaw Dam RD NW NaN NaN NaT NaN NaN 2017-6
678 NaN NaN 11444 Bearspaw Dam RD NW NaN NaN NaT NaN NaN 2017-7
679 NaN NaN 11444 Bearspaw Dam RD NW NaN NaN NaT NaN NaN 2017-8
680 NaN NaN 11444 Bearspaw Dam RD NW NaN NaN NaT NaN NaN 2017-9
681 NaN NaN 11444 Bearspaw Dam RD NW NaN NaN NaT NaN NaN 2017-10

682 rows × 9 columns

In [16]:
# Create Production Bar Chart
barchart = px.bar(
    data_frame = production_monthly_ready,
    x = 'Address',
    y = 'kWh',
    color = 'Address',
    opacity = 0.9,
    title = 'Solar Energy Production by location over time',
    labels={'kWh':'Total Production', 'address':''},
    width = 1000,
    height = 600,
    animation_frame= 'date_range',
    range_y = [0,110000]
)
barchart.update_layout(xaxis = dict( tickfont = dict(size=8)))
barchart.show()

Similar to the graph above, we are using this graph to see how the production at each of the locations has fluctuated over time.

In [17]:
# we have to import the data one more time for consumption to start with fresh data
# this is meant to revert the changes done for the other graphs so we can use the original data again.
consumption = pd.read_csv('Corporate_Energy_Consumption.csv')
consumption = consumption[consumption['Energy Description'].isin(['Solar Power','Electricity'])]

# Format the monthly data of consumption
consumption['Month'] = consumption['Month'].apply(lambda x: strptime(x, '%b').tm_mon)

# Keep data from '2015-9' - '2020-10'
consumption = consumption[~(consumption['Year']==2014) | ((consumption['Year']==2015) & (consumption['Month']<9))]

# Aggreagate data by month
consumption = consumption.groupby(['FacilityAddress','Year','Month'],as_index=False )['Total Consumption'].sum()

# Address reformat
production_monthly['Address'] = production_monthly['Address'].apply(lambda x: x.lower())
consumption['FacilityAddress'] = consumption['FacilityAddress'].apply(lambda x: x.lower() if x is not np.NaN else x)

# Merge consumption and production based on date and address
merged = pd.merge(consumption, production_monthly, left_on = ['FacilityAddress','Year','Month'], right_on = ['Address','Year','Month'], how = 'inner')

# Calculate percent consumption nad production
merged['percent.cons.prod'] = (merged['kWh'] / merged['Total Consumption']).round(2)
C:\Users\BEAST\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3071: DtypeWarning:

Columns (2) have mixed types.Specify dtype option on import or set low_memory=False.

In [18]:
merged
Out[18]:
FacilityAddress Year Month Total Consumption Name ID Address date kWh YearMonth percent.cons.prod
0 11444 bearspaw dam rd nw 2017 11 58433 Bearspaw Water Treatment Plant 577650 11444 bearspaw dam rd nw 2017-11-30 6282.511 2017-11 0.11
1 11444 bearspaw dam rd nw 2017 12 65206 Bearspaw Water Treatment Plant 577650 11444 bearspaw dam rd nw 2017-12-31 12986.017 2017-12 0.20
2 11444 bearspaw dam rd nw 2018 1 70727 Bearspaw Water Treatment Plant 577650 11444 bearspaw dam rd nw 2018-01-31 24984.565 2018-1 0.35
3 11444 bearspaw dam rd nw 2018 2 63999 Bearspaw Water Treatment Plant 577650 11444 bearspaw dam rd nw 2018-02-28 13708.372 2018-2 0.21
4 11444 bearspaw dam rd nw 2018 3 66184 Bearspaw Water Treatment Plant 577650 11444 bearspaw dam rd nw 2018-03-31 56143.460 2018-3 0.85
... ... ... ... ... ... ... ... ... ... ... ...
250 651m 25 av se 2020 4 26662 Manchester Building M 570079 651m 25 av se 2020-04-30 4697.606 2020-4 0.18
251 651m 25 av se 2020 5 25195 Manchester Building M 570079 651m 25 av se 2020-05-31 5295.992 2020-5 0.21
252 651m 25 av se 2020 6 30004 Manchester Building M 570079 651m 25 av se 2020-06-30 5517.547 2020-6 0.18
253 651m 25 av se 2020 7 27252 Manchester Building M 570079 651m 25 av se 2020-07-31 6777.535 2020-7 0.25
254 651m 25 av se 2020 8 25741 Manchester Building M 570079 651m 25 av se 2020-08-31 6599.162 2020-8 0.26

255 rows × 11 columns

In [19]:
# Unify the time range for each location
merged_address = pd.DataFrame(merged.Address.unique(),columns = ['Address'])
merged_date_range = pd.DataFrame(merged['YearMonth'].unique(),columns = ['date_range'])
merged_address['key'] = 1
merged_date_range['key'] = 1
merged_unified_date_range = pd.merge(merged_address,merged_date_range,on = 'key',how = 'outer').drop('key',1)


merged_monthly_ready = pd.merge(merged, merged_unified_date_range, left_on = ['Address','YearMonth'], right_on = ['Address','date_range'], how = 'right')
merged_monthly_ready = merged_monthly_ready.sort_values(['Address','date_range'])
In [20]:
# then we can plot how production and consumption relates to each other at each of the locations.
barchart = px.bar(
    data_frame = merged_monthly_ready,
    x = 'Address',
    y = 'percent.cons.prod',
    color = 'Address',
    opacity = 0.9,
    title = 'Solar Production as a Percentage of Overall Electricity Consumption',
    labels={'percent.cons.prod':'Production as Percentage of Consumption (%)', 'address':''},
    width = 1000,
    height = 600,
    animation_frame= 'date_range',
    range_y = [0,6.1]
)
barchart.update_layout(xaxis = dict( tickfont = dict(size=8)))
barchart.show()

In this graph we can see that over time it is Southland Leisure Center at 2000 Southland DR SW that is the most consistent user of solar energy, but once we near 2018, the Bearspaw Water Treatment is easily the largest user of the solar energy that they produce. This helps to verify what we have see in each of the last 2 graphs.

Question 3

How does solar energy production change with weather (including the time of year and time of Day)?

Each of the graphs will have kWh production shown as a blue line chart with the other variable layed on top of it. The graphs will be plotted in the following order.

  1. kWh Production vs. Precipitation
  2. kWh Production vs. Maximum Temperature
  3. kWh Production vs. Minimum Temperature
  4. kWh Production vs. Daylight Hours - This will also show the distribution of solar energy throughout the year and throughout the day.
    The reason that we chose these variables is becasue they are the indicators that we believed would affect solar energy production the most. The graphs below will attempt to see how much these weather readings actually will follow the amount of solar energy production and show if a relationship can be inferred.

kWh Production vs. Precipitation

In [21]:
# we will start by plotting the amount of precipitation and kWh production
fig = go.Figure(go.Scatter(x=weathersolar["Date"], y=weathersolar["kWh"],
                    mode='lines',
                    name='kWh Energy Production'))
fig.add_trace(go.Bar(x=weathersolar["Date"], y=weathersolar["Precipitation"],
                    name='Precipitation (in mm)',
                    yaxis = "y2"))
fig.update_layout(
    yaxis=dict(
        title="kWh Production",
        titlefont=dict(
            color="#1f77b4"),
        tickfont=dict(
            color="#1f77b4")),
    yaxis2=dict(
        title="Precipitaiton (in mm)",
        titlefont=dict(
            color="#000000"
        ),
        tickfont=dict(
            color="#000000"
        ),
        anchor="free",
        overlaying="y",
        side="right",
        position=0.95))
fig.update_xaxes(
    rangeslider_visible = True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month",stepmode="backward"),
            dict(count=3, label="3m", step="month",stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year",stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
print("kWh Production vs. Precipitation")
fig.show()
kWh Production vs. Precipitation

The graph above shows the plots of Precipitation in mm and solar energy production. These two variables seem to be negatively correlated. This is because we can see when the bars of the graph for precipitation rise the solar energy production seems to fall, and when precipitation falls solar energy production seems to increase. This makes sense as when theree is precipitation there is cloud cover, meaning less exposure to the sun for the solar panels.

kWh Production vs. Maximum Temperature

In [22]:
# then we can plot the maximum temperature and kWh production 
fig = go.Figure(go.Scatter(x=weathersolar["Date"], y=weathersolar["kWh"],
                    mode='lines',
                    name='kWh Energy Production'))
fig.add_trace(go.Scatter(x=weathersolar["Date"], y=weathersolar["Max Temp"],
                    name='Max Temperature (in Degrees C)',
                    yaxis = "y2"))
fig.update_layout(
    yaxis=dict(
        title="kWh Production",
        titlefont=dict(
            color="#1f77b4"),
        tickfont=dict(
            color="#1f77b4")),
    yaxis2=dict(
        title= "Temp. Deg. C",
        titlefont=dict(
            color="#000000"
        ),
        tickfont=dict(
            color="#000000"
        ),
        anchor="free",
        overlaying="y",
        side="right",
        position=0.95))
fig.update_xaxes(
    rangeslider_visible = True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month",stepmode="backward"),
            dict(count=3, label="3m", step="month",stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year",stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
print("kWh Production vs. Maximum Temperature")
fig.show()
kWh Production vs. Maximum Temperature

As seen in the graph above, the maximum temperature does seem to have some sort of correlation to the amount of Solar Energy produced on a given day. This can be seen in the peaks and troughs of the two line graphs, which coincide with each other very often. This is probably due to the fact that a higher maximum temperature means more sun exposure and/or more intense sun exposure and therefore will have higher output levels.

kWh Production vs. Minimum Temperature

In [23]:
# then we plot the minimum temperature and kWh production
fig = go.Figure(go.Scatter(x=weathersolar["Date"], y=weathersolar["kWh"],
                    mode='lines',
                    name='kWh Energy Production'))
fig.add_trace(go.Scatter(x=weathersolar["Date"], y=weathersolar["Min Temp"],
                    mode = "lines",
                    name='Min Temperature (in Degrees C)',
                    yaxis = "y2"))
fig.update_layout(
    yaxis=dict(
        title="kWh Production",
        titlefont=dict(
            color="#1f77b4"),
        tickfont=dict(
            color="#1f77b4")),
    yaxis2=dict(
        title="Temp. Deg. C",
        titlefont=dict(
            color="#000000"
        ),
        tickfont=dict(
            color="#000000"
        ),
        anchor="free",
        overlaying="y",
        side="right",
        position=0.95))
fig.update_xaxes(
    rangeslider_visible = True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month",stepmode="backward"),
            dict(count=3, label="3m", step="month",stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year",stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
print("kWh Production vs. Minimum Temperature")
fig.show()
kWh Production vs. Minimum Temperature

As seen in the graph regarding minimum temperature, there does not seem to be much of a correaltion. Although the minimum temperature may correlate with a a lower energy production, the graphs lined up together do not seem to spike and/or follow the same trends leading to the likleihood that there is little correlation betwen these two varibales.

Average solar Energy Production through the day

In [24]:
# plot for distribution of solar energy production throughout the day
time_solar_per_time_df.plot(kind='bar',color='r',figsize=(16,9),fontsize=15)
plt.title("Average solar Energy Production per day",fontsize=20)
plt.xlabel("Time",fontsize=15)
plt.ylabel("Solar Production Per Hours in day (in kWh)",fontsize=15)
plt.show()

kWh Production vs. Daylight Hours

In [25]:
# then we plot the daylight hours and the kWh production
fig = go.Figure(go.Scatter(x=weathersolar["Date"], y=weathersolar["kWh"],
                    mode='lines',
                    name='kWh Energy Production'))
fig.add_trace(go.Scatter(x=weathersolar["Date"], y=weathersolar["Daylight (in Hrs)"],
                    name='Amount of Daylight (in Hours)',
                    yaxis = "y2"))
fig.update_layout(
    yaxis=dict(
        title="kWh Production",
        titlefont=dict(
            color="#1f77b4"),
        tickfont=dict(
            color="#1f77b4")),
    yaxis2=dict(
        title="Amount of Daylight (in Hours)",
        titlefont=dict(
            color="#000000"
        ),
        tickfont=dict(
            color="#000000"
        ),
        anchor="free",
        overlaying="y",
        side="right",
        position=0.95))
fig.update_xaxes(
    rangeslider_visible = True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month",stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year",stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
print("kWh Production vs. Daylight Hours")
fig.show()
kWh Production vs. Daylight Hours

The graph that shows the amount of daylight in hours, almost perfectly represents the trends that are seen in the produciton of Solar Energy. In the summer months when there is more daylight it can be seen that both daylight hours and solar enrgy production rise. Then in the winter months there is much lower energy output and much less daylight, leading to the fact that Solar Energy is heavily positively correlated to Dayligh Hours. This graph is also able to show a more in depth view of the trends over time, as we can see that in 2017 there is less solar enregy production than in 2018, which is our max year as seen in question 1 above. Then it is able to break down how the solar energy production is broken down by month. Below is the graph that also shows the graph for the daily distribution of solar energy, from this we can see that it is very nicely normally distributed, with the most solar energy coming in the middle of the day, which makes sense becuase that is when the sun shines the brightest.

Conclusion

From our exploration we can confirm many different things that are interesting about solar energy production in Calgary over time. Solar energy has increased over time with low totals in 2015 growing to a significant energy source in 2020 as seen in the production versus consumption section of our exploration. Once we were able to see that it was a viable energy source and used mainly for transportation and water treatment facilites, from that consumption versus production exploration, we were able to look at the efficiency and ability to use the energy at different times in the year using the weather statistics. We could see from each of the plots that there were negatively related variables such as precipitation, and then there were positively related variables which inlcuded maximum temperature and the number of daylight hours. When we break down this information further we can see that solar energy is very useful in the summer and posts the highest solar enrgy production totals, we can also see that the hotter the day is the likelihood is there will be higher production levels as well. When there is cloud cover in the form of precipitation there is less potential for solar energy production. The culmination of this information leads us to believe that using solar energy for water treatment facilities, or transportation in hot dry locations, will lead to optimal solar energy consumption and should be considered first for new installations as solar energy continues to grow as given by the data for the City of Calgary.

Bibliography

Corporate Analytics and Innovation, The City of Calgary. (Updated Daily) Solar Energy Production [Online] Available at: https://data.calgary.ca/Environment/Solar-Energy-Production/ytdn-2qsp (Accessed: September 23, 2020)

Corporate Analytics and Innovation, The City of Calgary. (Updated Daily) Primary Electricity Usage [Online] Available at: https://data.calgary.ca/Environment/Solar-Energy-Production/ytdn-2qsp (Accessed: September 23, 2020)

Plotly. Line Charts in Python,[Online] Available at: https://plotly.com/python/line-charts/ (Accessed: Oct. 17, 2020)

Plotly. Multiple Axes in Python, [Online] Available at: https://plotly.com/python/multiple-axes/ (Accessed: Oct. 18, 2020)

Plotly. Time Series and Date Axes in Python, [Online] Avaialble at: https://plotly.com/python/time-series/#time-series-with-range-slider (Accessed: Oct. 18, 2020)

weatherstats.ca based on Environment and Climate Change Canada data. (Updated Every 5 Minutes) Climate Daily/Forecast/Sun [Online] Available at: https://calgary.weatherstats.ca/download.html (Accessed: Oct. 17, 2020)

In [ ]: